103 research outputs found
k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection
Anomaly detection aims at identifying images that deviate significantly from
the norm. We focus on algorithms that embed the normal training examples in
space and when given a test image, detect anomalies based on the features
distance to the k-nearest training neighbors. We propose a new operator that
takes into account the varying structure & importance of the features in the
embedding space. Interestingly, this is done by taking into account not only
the nearest neighbors, but also the neighbors of these neighbors (k-NNN). We
show that by simply replacing the nearest neighbor component in existing
algorithms by our k-NNN operator, while leaving the rest of the algorithms
untouched, each algorithms own results are improved. This is the case both for
common homogeneous datasets, such as flowers or nuts of a specific type, as
well as for more diverse dataset
CLID: Controlled-Length Image Descriptions with Limited Data
Controllable image captioning models generate human-like image descriptions,
enabling some kind of control over the generated captions. This paper focuses
on controlling the caption length, i.e. a short and concise description or a
long and detailed one. Since existing image captioning datasets contain mostly
short captions, generating long captions is challenging. To address the
shortage of long training examples, we propose to enrich the dataset with
varying-length self-generated captions. These, however, might be of varying
quality and are thus unsuitable for conventional training. We introduce a novel
training strategy that selects the data points to be used at different times
during the training. Our method dramatically improves the length-control
abilities, while exhibiting SoTA performance in terms of caption quality. Our
approach is general and is shown to be applicable also to paragraph generation
Is Image Memorability Prediction Solved?
This paper deals with the prediction of the memorability of a given image. We
start by proposing an algorithm that reaches human-level performance on the
LaMem dataset - the only large scale benchmark for memorability prediction. The
suggested algorithm is based on three observations we make regarding
convolutional neural networks (CNNs) that affect memorability prediction.
Having reached human-level performance we were humbled, and asked ourselves
whether indeed we have resolved memorability prediction - and answered this
question in the negative. We studied a few factors and made some
recommendations that should be taken into account when designing the next
benchmark
CloudWalker: Random walks for 3D point cloud shape analysis
Point clouds are gaining prominence as a method for representing 3D shapes,
but their irregular structure poses a challenge for deep learning methods. In
this paper we propose CloudWalker, a novel method for learning 3D shapes using
random walks. Previous works attempt to adapt Convolutional Neural Networks
(CNNs) or impose a grid or mesh structure to 3D point clouds. This work
presents a different approach for representing and learning the shape from a
given point set. The key idea is to impose structure on the point set by
multiple random walks through the cloud for exploring different regions of the
3D object. Then we learn a per-point and per-walk representation and aggregate
multiple walk predictions at inference. Our approach achieves state-of-the-art
results for two 3D shape analysis tasks: classification and retrieval
- …